Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add archive_files to scrape args #129

Merged
merged 5 commits into from
Jun 7, 2024

Conversation

alexobaseki
Copy link
Contributor

@alexobaseki alexobaseki commented Jun 6, 2024

DATA-4855

  • Set is_archive_files argument

Copy link
Contributor

@jessemortenson jessemortenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jessemortenson
Copy link
Contributor

You know my first read was just looking at the code as is, and taking a moment to think I'm realizing you took a different way to implement this than I was imagining. I was imagining that the configuration flag would be in the os-realtime lambda function config. So that if you wanted files to start archiving, you'd change the Lambda config in AWS admin console.

This way should work fine, and it has the advantage of allowing a dev to run an ad hoc scrape with this flag set to test things (without changing behavior of other data flowing through). One disadvantage I can think of is that we'd have to redeploy DAGs (or I guess in the short run update about a dozen task definitions in the OS task defs repo?) if we wanted to switch the behavior for all realtime scraper runs.

Were there other advantages/disadvantages you were thinking of with this approach?

@alexobaseki
Copy link
Contributor Author

You know my first read was just looking at the code as is, and taking a moment to think I'm realizing you took a different way to implement this than I was imagining. I was imagining that the configuration flag would be in the os-realtime lambda function config. So that if you wanted files to start archiving, you'd change the Lambda config in AWS admin console.

This way should work fine, and it has the advantage of allowing a dev to run an ad hoc scrape with this flag set to test things (without changing behavior of other data flowing through). One disadvantage I can think of is that we'd have to redeploy DAGs (or I guess in the short run update about a dozen task definitions in the OS task defs repo?) if we wanted to switch the behavior for all realtime scraper runs.

Were there other advantages/disadvantages you were thinking of with this approach?

Thanks for additional insight. I approached the problem this based on the context(or lack of it). I have on our the system works. I did think of it having that advantage for easy local testing but I think I like your approach better. I am assuming that the lambda function config will be reflected in context args of process_import_function(event, context): such that
if context.file_archiving_enabled: #ARCHIVE_FILE?

Copy link
Contributor

@elseagle elseagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! A couple of follow-up actions are needed here (just in case you might not be aware):

  1. We need to merge and release this first (all the way to pypi)
  2. Then update openstates-scraper with the latest release of openstates-core via poetry
  3. Optionally: update openstates-realtime with the latest release of openstates-core via poetry, but this is not really important this time because the change in openstates-core is not directly used in the openstates-realtime project i.e we are not calling openstates-core library or any of the changes used directly just communication via SQS and Lambda and that's handled in the openstates realtime PR

@alexobaseki
Copy link
Contributor Author

LGTM, thank you! A couple of follow-up actions are needed here (just in case you might not be aware):

  1. We need to merge and release this first (all the way to pypi)
  2. Then update openstates-scraper with the latest release of openstates-core via poetry
  3. Optionally: update openstates-realtime with the latest release of openstates-core via poetry, but this is not really important this time because the change in openstates-core is not directly used in the openstates-realtime project i.e we are not calling openstates-core library or any of the changes used directly just communication via SQS and Lambda and that's handled in the openstates realtime PR

Thank you Sogo, I probably need to a lecture on how all of this work on a call.

@alexobaseki alexobaseki merged commit b62466f into main Jun 7, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants